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REMARKS 

In response to the above-identified Final Office Action, Applicant amends the Application 
and seeks reconsideration in view of the following remarks. In this Response, Applicant amends 
claims 1, 3-5, 7, 9-11, 13-15, 17, and 19-23. Applicant does not cancel or add any claims. 
Accordingly, claims 1-5,7-15, 17, and 19-23 remain pending in the Application. 

I. Amendments to the Specification 

Applicant submits a substitute specification. The substitute specification amends the 
originally-filed specification by capitalizing the trademarked term INFINIBAND and Applicant's 
specification at paragraph [0019] provides generic terminology for the term "INFINIBAND" in 
accordance with MPEP § 608.0 l(v). The substitute specification also includes amendments to 
correct other inadvertent typographical and grammatical errors. Applicant submits that no new 
matter is added by the amendments to the specification. 

II. Claims Rejected Under 35 U.S.C. § 112 

Claims 21-23 stand rejected under 35 U.S.C. § 1 12, second paragraph, as being allegedly 
indefinite. Specifically, claims 21-23 recite the trademarked term "INFINIBAND" (see Paper 
No./Mail Date 20071 102 , pages 2-3). Applicant respectfully traverses the rejection. 

MPEP § 608.0 l(v) states that if a "trademark has a fixed and definite meaning, it constitutes 
sufficient identification unless some physical or chemical characteristic of the article or material is 
involved in the invention." Applicant submits that the one skilled in the art attaches a fixed and 
definite meaning to the term "INFINIBAND." That is, the term "INFINIBAND" has been adopted 
by the art as a term having a specific meaning and is, therefore, more than merely a trademarked 
term. Therefore, Applicant submits that the term "INFINIBAND" is a definite term. Accordingly, 
Applicant respectfully requests withdrawal of the rejection of claims 21-23. 
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III. Claims Rejected Under 35 U.S.C. g 102 

Claims 10, 12-15, and 22 stand rejected under 35 U.S.C. § 102(a) as being anticipated by 
U.S. Patent Application Publication No. 2003/0033427 filed by Brahmaroutu ("Brahmaroutu"). 
Applicant respectfully traverses the rejection, at least in view of the amendments to independent 
claim 10. 

To anticipate a claim, the cited reference must disclose each and every element of the 
rejected claim (see MPEP § 2131 ). Amended claim 10 defines a method comprising the steps of 
"providing a plurality of first stage switches and a plurality of second stage switches coupling a 
plurality end nodes to one another to form a network, the plurality of second stage switches coupled 
to each of the plurality of first stage switches, calculating a plurality of routing trees comprising the 
plurality of first stage switches and one of the plurality of second stage switches " and creating "a 
path between each of the plurality of end nodes that enables the network operate as a strictly non- 
interfering network " (emphasis added). Applicant submits that Brahmaroutu fails to disclose at least 
these elements of claim 10. 

In making the rejection, the Patent Office does not cite Brahmaroutu as disclosing 
"providing a plurality of first stage switches and a plurality of second stage switches coupling a 
plurality end nodes to one another to form a network, the plurality of second stage switches coupled 
to each of the plurality of first stage switches, calculating a plurality of routing trees comprising the 
plurality of first stage switches and one of the plurality of second stage switches" and creating "a 
path between each of the plurality of end nodes that enables the network operate as a strictly non- 
interfering network," as recited in claim 10. Moreover, in reviewing Brahmaroutu, Applicant is 
unable to discern any sections of Brahmaroutu disclosing such elements. 

Brahmaroutu discloses "a method for determining a number of possible multiple paths for a 
given port in a switched fabric" (Brahmaroutu, paragraph [000 1 ]). That is, Brahmaroutu discloses a 
method that calculates all of the possible paths between multiple nodes and determines the shortest 
path between each node from among the calculated possible paths (Brahmaroutu, paragraph [004 1 ]). 
Specifically, and with reference to FIG. 4 of Brahmaroutu, Brahmaroutu discloses that: 
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there are 6 different loopless paths between the host system 130 and 
the IOU 170 traversing a set of <L5, L9, L8>, <L5, LI, L2, L8>, 
<L5, LI, L3, L8>, <L5, L6, L10, L8>, <L5, L7, L10, L8> and <L5, 
L 1 , L4, L 1 0, L8> links respectively. The LMC value for the two CA 
ports is, however, programmed to be four (4) which allows 16 
different paths each represented by a unique LID and LMC 
combination. Therefore, the channel adapter (CA1) 120 of the host 
system 130 and the channel adapter (CA2) 160 of the remote IOU 
170 can be addressed using any LID in the range 0x30-0x3F and 
0x50-0x5F, respectively. Each of these LIDs can potentially be used 
to reach the respective CA port. However, 10 out of possible 16 LID 
and LMC combinations do not represent any path between the 
channel adapter (CA1) 120 of the host system 130 and the channel 
adapter (CA2) 160 of the remote IOU 170 in this example. 
(Brahmaroutu, paragraph [0034]). 

Applicant submits that in Brahmaroutu' s method, the shortest path between nodes for 
multiple pairs of nodes can include the same link, especially in larger network configurations. 
Specifically, and with reference to FIG. 4 of Brahmaroutu, an additional system or node connected 
to, for example, switch S2 would have a path (either the shortest path or otherwise) to IOU 170 that 
shares at least one link (e.g., link LI, link L4, link L9, or L 10) with a path (either the shortest path 
or otherwise) from host 130 to IOU 170. That is, because of the manner in which switches SI -S4 in 
BrahmaroutuavQ are interconnected, any additional system or node added to Brahmaroutu' 's network 
would, at least in some instances, compete with other systems/nodes for network resources since 
there are not enough resources (e.g., links, switches, nodes, etc.) for creating a dedicated path 
between each source node and each destination node. In other words, at least two paths between a 
pair of sources and destinations in Brahmaroutu' 's network include a common link, which is contrary 
to the operation of a network that operates as a "strictly non-interfering network." 

By contrast, and with reference to Applicant's FIGS. 1 -3, a plurality of second stage switches 
(e.g., second stage switches 118 (see FIG. 1), 218 (see FIG. 2), and 318 (see FIG. 3)) coupled to 
each of a plurality of first stage switches (e.g., first stage switches 1 16 (see FIG. 1), 216 (see FIG. 2), 
and 350 (see FIG. 3)) do not compete with one another for network resources. That is, a two-stage 
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switching topology provides enough resources that each path between each source and each 
destination does not have to share resources. Therefore, Brahmaroutu fails to disclose each and 
every element of claim 10. 

The failure of Brahmaroutu to disclose each and every element of claim 10 is fatal to the 
anticipation rejection. Therefore, claim 10 is not anticipated by Brahmaroutu. Accordingly, 
Applicant respectfully requests withdrawal of the rejection of independent claim 10. 

Claims 12-15 and 22 depend from claim 10 and includes all of the elements thereof. 
Therefore, Applicant submits that claims 12-15 and 22 are not anticipated by Brahmaroutu at least 
for the same reasons as claim 10, in addition to their own respective features. Accordingly, 
Applicant respectfully requests withdrawal of the rejection of claims 12-15 and 22. 

IV. Claims Rejected Under 35 U.S.C. § 103 

Claims 1-5, 7-9, 11, 17, 19-21, and 23 stand rejected under 35 U.S.C. § 103(a) as being 
obvious over Brahmaroutu in view of U.S. Patent Application Publication No. 2003/01 18013 Al 
filed by Steele ("Steele "). Applicant respectfully traverses the rejection, at least in view of the 
amendments to independent claims 1,10, and 17. 

To render a claim obvious, the cited references must teach or suggest each and every element 
of the rejected claim {see MPEP § 2143 ). Claim 1 1 depends from claim 10 and includes all of the 
elements thereof Furthermore, Applicant submits that claims 1 -5 , 7-9, 17,1 9-2 1 , and 23 each recite 
elements similar to the elements of independent claim 10 discussed above with respect to the 
anticipation rejection based on Brahmaroutu. In rejecting claims 1-5, 7-9, 11, 17, 19-21, and 23, the 
Patent Office characterizes the disclosure in Brahmaroutu similar to the anticipation rejection of 
claim 10 discussed above. Applicant has discussed above the failure of Brahmaroutu to disclose at 
least the elements of, "providing a plurality of first stage switches and a plurality of second stage 
switches coupling a plurality end nodes to one another to form a network, the plurality of second 
stage switches coupled to each of the plurality of first stage switches, calculating a plurality of 
routing trees comprising the plurality of first stage switches and one of the plurality of second stage 
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switches" and creating "a path between each of the plurality of end nodes that enables the network 
operate as a strictly non-interfering network," as recited in claim 1 1 (via claim 10) and similarly 
recited in claims 1-5, 7-9, 17, 19-21, and 23, and submits that such discussion in equally applicable 
to an obviousness rejection of claims 1-5, 7-9, 11, 17, 19-21, and 23 based on Brahmaroutu. 
Therefore, Brahmaroutu fails to teach or suggest each and every element of claims 1-5, 7-9, 11,17, 
19-21, and 23. The Patent Office relies on the disclosure in Steele to cure the defects of 
Brahmaroutu; however, Applicant submits that Steele fails to cure such defects. 

In making the rejection, the Patent Office alleges that Steele discloses "a Clos switch fabric 
that provides groups of source and destination ports as well as routing information" (Paper No ./Mail 
Date 20071102 . pages 8 and 12-13, citations omitted). The Patent Office does not cite Steele as 
disclosing the elements of, ""providing a plurality of first stage switches and a plurality of second 
stage switches coupling a plurality end nodes to one another to form a network, the plurality of 
second stage switches coupled to each of the plurality of first stage switches, calculating a plurality 
of routing trees comprising the plurality of first stage switches and one of the plurality of second 
stage switches" and creating "a path between each of the plurality of end nodes that enables the 
network operate as a strictly non- interfering network," as recited in claim 1 1 (via claim 10) and 
similarly recited in claims 1-5, 7-9, 17, 19-21, and 23. Moreover, in reviewing Steele, Applicant is 
unable to discern any sections of Steele as disclosing such elements. Therefore, Steele fails to cure 
the defects of Brahmaroutu. 

The failure of the combination of Brahmaroutu and Steele to teach or suggest each and every 
element of claims 1-5, 7-9, 11, 17, 19-21, and 23 is fatal to the obviousness rejection. Therefore, 
claims 1-5, 7-9, 11, 17, 19-21, and 23 are not obvious over Brahmaroutu in view of Steele. 
Accordingly, Applicant respectfully requests withdrawal ofthe rejection of claims 1-5,7-9, 11, 17, 
19-21, and 23. 
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V. Claim Amendments 

Claims 3-5, 7, 9, 1 1, 13-15, and 19-23 have been amended so that various elements recited in 
these claims are consistent with their respective independent claims. 

CONCLUSION 

In view of the foregoing, it is believed that all claims now pending are in condition for 
allowance. A Notice of Allowance is earnestly solicited at the earliest possible date. If the Patent 
Office believes that a telephone conference would be useful in moving the application forward to 
allowance, the Patent Office is encouraged to contact the undersigned at (480) 385-5060 or 
jgraff@ifllaw.com. 

If necessary, the Commissioner is hereby authorized to charge payment or credit any 
overpayment to Deposit Account No. 50-2091 for any fees required under 37 C.F.R. §§ 1.16 or 1.17, 
particularly extension of time fees. 

Respectfully submitted, 

Date December 28, 2007 /JASON R. GRAFF/ 

Jason R. Graff 
Reg. No. 54,134 

Attachments: Replacement Specification 
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MARKED-UP VERSION 



RELATED CASES 

[0001] Related subject matter is disclosed in U.S. patent application entitled "INFINIBAND 

SWITCH OPERATING IN A CLOS NETWORK" having application Ser. No. [[ ]] 

10/722,213 and filed on the same date herewith and assigned to the same assignee. 

[0002] Related subject matter is disclosed in U.S. patent application entitled "STRICTLY NON- 
INTERFERING NETWORK" having application Ser. No. [[ ]] 10/722.022 and filed on 

the same date herewith and assigned to the same assignee. 

[0003] Related subject matter is disclosed in U.S. patent application entitled "CONNECTION 

CONTROLLER" having application Ser. No. [[ ]] 10/722.021 and filed on the same date 

herewith and assigned to the same assignee. 

BACKGROUND OF THE INVENTION 

[0004] Current switching topologies for network operations can cause a network to suffer 
performance degradation due to latency. Significant delays from latency can result from queuing 
delays in network switches due to interference caused by competing traffic sources attempting to 
use the same network resources at the same time. This can cause packets to queue up in one or 
more switches and delay the packet's delivery to its destination. This increase in latency slows 
network response time and can result in lost packets and other disadvantageous network 
behavior. 

[0005] Accordingly, there is a significant need for an apparatus and method that overcomes the 
deficiencies of the prior art outlined above. 



BRIEF DESCRIPTION OF THE DRAWINGS 



[0006] Referring to the drawing: 

[0007] FIG. 1 depicts a network according to one embodiment of the invention; 

[0008] FIG. 2 depicts a network according to another embodiment of the invention; 

[0009] FIG. 3 depicts a network according to yet another embodiment of the invention; 

[0010] FIG. 4 depicts a block diagram of a network according to an embodiment of the 
invention; 

[0011] FIG. 5 illustrates a flow diagram of a method of the invention according to an 
embodiment of the invention; 

[0012] FIG. 6 illustrates a flow diagram of a method of the invention according to another 
embodiment of the invention; and 

[0013] FIG. 7 illustrates a flow diagram of a method of the invention according to yet another 
embodiment of the invention. 

[0014] It will be appreciated that for simplicity and clarity of illustration, elements shown in the 
drawing have not necessarily been drawn to scale. For example, the dimensions of some of the 
elements are exaggerated relative to each other. Further, where considered appropriate, reference 
numerals have been repeated among the Figures to indicate corresponding elements. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 

[0015] In the following detailed description of exemplary embodiments of the invention, 
reference is made to the accompanying drawings that illustrate specific exemplary embodiments 



in which the invention may be practiced. These embodiments are described in sufficient detail to 
enable those skilled in the art to practice the invention, but other embodiments may be utilized 
and logical, mechanical, electrical and other changes may be made without departing from the 
scope of the present invention. The following detailed description is, therefore, not to be taken in 
a limiting sense, and the scope of the present invention is defined only by the appended claims. 

[0016] In the following description, numerous specific details are set forth to provide a thorough 
understanding of the invention. However, it is understood that the invention may be practiced 
without these specific details. In other instances, well-known circuits, structures and techniques 
have not been shown in detail in order not to obscure the invention. 

[0017] In the following description and claims, the terms "coupled" and "connected," along with 
their derivatives, may be used. It should be understood that these terms are not intended as 
synonyms for each other. Rather, in particular embodiments, "connected" may be used to 
indicate that two or more elements are in direct physical or electrical contact. However, 
"coupled" may mean that two or more elements are not in direct contact with each other, but yet 
still co-operate or interact with each other. 

[0018] For clarity of explanation, the embodiments of the present invention are presented, in 
part, as comprising individual functional blocks. The functions represented by these blocks may 
be provided through the use of software, or shared or dedicated hardware, including, but not 
limited to, hardware capable of executing software. The present invention is not limited to 
implementation by any particular set of elements, and the description herein is merely 
representational of one embodiment. 

[0019] FIG. 1 depicts a network 100 according to one embodiment of the invention. In an 
embodiment, network 100 can be implemented in one or more chassis in a backplane-type 
interconnect environment. In another embodiment, network 100 can be implemented on the same 
switching board or switching chip. Network 100 may utilize a packet data protocol for traffic 
movement among switches and end-node devices. For example, network 100 may use InfiniBand 
INFINIBAND . InfiniBand INFINIBAND is specified by the InfiniBand™ INFINIBAND™ 



Architecture Specification, Release 1.1 or later, as promulgated by the InfiniBand™ 
INFINIBAND™ Trade Association, 5440 SW Westgate Drive, Suite 217, Portland, Oregr 
Oregon 97221. As such, network 100 utilizes data packets having fixed or variable length, 
defined by the applicable protocol. 

[0020] The network 100 depicted in FIG. 1 includes first stage InfiniBand INFINIBAND 
switches 116 coupled to second stage InfiniBand INFINIBAND switches 118 by a plurality of 
links 115. In an embodiment, each of plurality of links 115 can be bi-directional. In an 
embodiment, plurality of links 115 operated under InfiniBand INFINIBAND can be lx, 4x or 
12x speed links. In an embodiment, each of first stage InfiniBand INFINIBAND switches 116 
can be coupled to one or more of a plurality of end nodes 1 14. Each of plurality of end nodes 1 14 
can be, for example and without limitation, application servers, database servers, and the like. In 
an embodiment, each of plurality of end nodes 1 14 can act as a source (i.e. creating a packet and 
placing it in network 100), or a destination (an end point for a packet created by a source). In 
another embodiment, one or more of each of plurality of end nodes 1 14 can act as both a source 
for one packet, and as a destination for another packet. For example, source 122 can create a 
packet with a destination 126. In an embodiment, network 100 is a non-blocking network. 

[0021] In an embodiment, two or more first stage InfiniBand INFINIBAND switches 116 may 
be implemented within a single switching entity, for example a single switching chip, physical 
switching unit, and the like. Also, two or more of second stage InfiniBand INFINIBAND 
switches 118 may be implemented within a single switching entity. In yet another embodiment, 
two or more InfiniBand INFINIBAND switches may be functionally replaced with either a 
single InfiniBand INFINIBAND switch or a subnetwork with a non-blocking topology. In an 
exemplary embodiment of the invention, network 100 can be built using any number of 
InfiniBand INFINIBAND switches, where an InfiniBand INFINIBAND switch can be a 24-port 
Mellanox Anafa-II InfiniBand INFINIBAND Switch, manufactured by Mellanox TechnologieSi 
Inc. . 2900 Stender Way, Santa Clara, Ca4i£ California 95054. The invention is not limited to the 
use of this switch and another type or model of InfiniBand INFINIBAND switch may be used 
and be within the scope of the invention. 



[0022] The plurality of links 115 can use, for example and without limitation, 100 ohm 
differential transmit and receive pairs per channel. Each channel can use high-speed 
serialization/deserialization (SERDES) and 8b/10b encoding. 

[0023] In network terminology, admissible traffic patterns are traffic patterns in an InfiniBand 
INFINIBAND network where the traffic entering the InfiniBand INFINIBAND network does 
not exceed the InfiniBand INFINIBAND network's ability to output traffic. Interference in a 
network occurs when competing traffic sources attempt to use the same network resources at the 
same time. This can result in a degradation of the sustained rate of data transfer which one or 
more of the sources can maintain. It will either result in an increased latency or packet loss. In a 
network operating using InfiniBand INFINIBAND , link flow control algorithms guarantee that 
short-term congestion will not result in packet loss. Therefore, in a network operating using 
InfiniBand INFINIBAND , short-term congestion will manifest itself as increased data transfer 
latency. 

[0024] A non- interfering network (i.e. a network without interference) is a network for which the 
performance degradation for any admissible traffic pattern is guaranteed to conform to a pre- 
specified bound. This bound can be either deterministic or statistical. For example, a network can 
be deemed non-interfering if the worst-case end-to-end latency is guaranteed to be less than ten 
microseconds. This is an example of a deterministic bound. As another example, a network can 
be deemed non-interfering if 99% of packets experience network latencies of less than two 
microseconds. This is an example of a statistical bound. These are just examples and are not 
limiting of the invention. The appropriate choice for a pre-specified bound is application 
specific, and a network supporting multiple applications can impose different bounds on 
performance on each traffic type. 

[0025] A strictly non-interfering network (SNIN) is a network for which the only queuing delays 
experienced by an admissible traffic pattern are attributable to the multiplexing of packets from 
slow links onto a faster link whose aggregate bandwidth at least equals the sum of the 
bandwidths of the smaller links. In a SNIN, competing traffic sources do not attempt to use the 



same network resources at the same time. The implementation of a SNIN requires that resources 
be dedicated through the network in support of an active communication session. In order to 
accomplish this, non-blocking networks can be used. 

[0026] A network is non-blocking if it has adequate internal resources to carry out all possible 
admissible traffic patterns. There are different degrees of non-blocking performance based upon 
the sophistication of the control policy required to achieve non-blocking performance. 

[0027] Most network switching applications allow the establishment of new connections and the 
tear down of old ones. It is possible that for a network with a non-blocking topology, a new 
connection can be blocked due to poor or unfortunate assignment of previously established 
connections. A strictly non-blocking network is a network for which any new admissible 
connection may be accepted independent of the state of preexisting connections, or the policy 
used to reroute preexisting connections, without changing the routes of the preexisting 
connections. A crossbar network is an example of a strictly non-blocking network. As another 
example, a rearrangably non-blocking network is a network that may be augmented by a 
mechanism to reroute preexisting connections such that it is possible to carry the preexisting 
connections and any new admissible connection. 

[0028] Another type of non-blocking network is a €4es CLOS network. €4es CLOS networks are 
known in the art. For example, see "A Study of Non-Blocking Switching Networks" by Charles 
Clos, Bell System Technical Journal, 1953, vol. 32, no. 2, pp. 406-424. In an embodiment, €4es 
CLOS networks can include FAT trees and K-nary arrays, other non-blocking networks, and the 
like. In an embodiment, network 100 is a €4es CLOS network 120. In an embodiment, €4es 
CLOS network 120 can be a two stage hierarchical network in which each node in the first stage 
connects to each node in the second stage through a plurality of links 115. In the embodiment 
shown in FIG. 1, first stage InfiniBand INFINIBAND switches 116 can be considered the first 
stage and second stage InfiniBand INFINIBAND switches 118 can be considered the second 
stage. 



[0029] As an illustration of an embodiment of the invention, traffic can traverse network 100. 



Traffic (i.e. a packet) originating at end node 122 can enter InfiniBand INFINIBAND switch 106 
through an end-node port 112, passes through an internal switch link. The packet proceeds to one 
of second stage InfiniBand INFINIBAND switches 118, for example InfiniBand INFINIBAND 
switch 102, via one of plurality of links 115 (where plurality of links 115 are bi-directional). The 
packet crosses through internal switch link at InfiniBand INFINIBAND switch 102, and back to 
one of first stage InfiniBand INFINIBAND switches 116, for example InfiniBand INFINIBAND 
switch 108, via one of plurality of links 115. The packet can then proceed to an end node coupled 
to InfiniBand INFINIBAND switch 108, for example end node 126. 

[0030] Although only one of plurality of links 1 15 is shown between eaeh first stage InfiniBand 
INFFNIBAND switches 116 and second stage InfiniBand INFFNIBAND switches 118, the 
invention is not limited to only one link. In other embodiments there can be more than one of 
plurality of links 1 15 between each of first stage InfiniBand FNFINIBAND switches 116 and 
each of second stage InfiniBand INFINIBAND switches 118. 

[0031] The number of plurality of links 115 between each pairing of first stage InfiniBand 
INFINIBAND switches 1 16 and second stage InfiniBand INFINIBAND switches 118 compared 
to the number of end-node ports on each of first stage InfiniBand INFINIBAND switches 116 
determines the degree of blocking potentially experienced by traffic crossing €4es CLOS 
network 120. For example, if the number of second stage InfiniBand INFINIBAND switches 118 
is greater than or equal to the number of end node ports 112 on a first stage InfiniBand 
INFINIBAND switch 116, then €4es CLOS network 120 is a rearrangably non-blocking €4es 
CLOS network. As explained above, network 100 is non-blocking if it has adequate internal 
resources to carry out all admissible traffic patterns. As another example, €4es CLOS network 
120 is strictly non-blocking if the number of second stage InfiniBand INFINIBAND switches 
1 18 is equal to or greater than 2*(number of end-node ports 1 12)-1. 

[0032] Although FIG. 1 depicts a two stage hierarchical network, which can be a €4es CLOS 
network 120, this is not limiting of the invention. Network 100, and €4es CLOS network 120 can 
have any number of hierarchical stages and be within the scope of the invention. In other words, 
multistage networks and multistage €4es CLOS networks are within the scope of the invention. 



[0033] Although FIG. 1 depicts three first stage InfiniBand INFINIBAND switches 116, 
specifically, InfiniBand INFINIBAND switches 106, 108, 110, and two second stage InfiniBand 
INFINIBAND switches 118, specifically InfiniBand INFINIBAND switches 102, 104, any 
number of first stage InfiniBand INFINIBAND switches 116 and second stage InfiniBand 
INFINIBAND switches 1 18 are within the scope of the invention. Also, any number of end-node 
ports 112 are within the scope of the invention. Further, any number of switch interlink ports 
coupling InfiniBand INFINIBAND switches to each other via plurality of links 115 are within 
the scope of the invention. Still further, any number of plurality of end nodes 1 14 are within the 
scope of the invention. 

[0034] FIG. 2 depicts a network 200 according to another embodiment of the invention. As 
shown in FIG. 2, network 200 includes first stage InfiniBand INFINIBAND switches 216 
coupled to second stage InfiniBand INFINIBAND switches 218 via plurality of links. In an 
embodiment, network 200 can be a Gies CLOS network 220 since each node in the first stage 
connects to each node in the second stage. In an embodiment, each of plurality of first stage 
InfiniBand INFINIBAND switches 216 can be coupled to one or more of plurality of end nodes 
(not shown for clarity), via plurality of end node ports. For example, InfiniBand INFINIBAND 
switch 210 can comprise plurality of end node ports 252, InfiniBand INFINIBAND switch 211 
can comprise plurality of end node ports 254, InfiniBand INFINIBAND switch 212 can comprise 
plurality of end node ports 256, and InfiniBand INFINIBAND switch 213 can comprise plurality 
of end node ports 258. 

[0035] In the embodiment, depicted in FIG. 2, second stage InfiniBand INFINIBAND switches 
218 include InfiniBand INFINIBAND switch 202, 204, 206, 208. In network 200, particularly in 
€4es CLOS network 220, the stage of InfiniBand INFINIBAND switches furthest from plurality 
of end nodes are referred to as spine nodes. In the embodiment depicted in FIG. 2, second stage 
InfiniBand INFINIBAND switches 218 can be considered spine nodes. Therefore, in this 
embodiment, each InfiniBand INFINIBAND switch 202, 204, 206, 208 is a spine node. 



[0036] A spanning tree is any group of nodes and links, (where nodes can be InfiniBand 



INFINIBAND switches, end nodes, and the like), containing is a unique path between every pair 
of nodes in the network. A routing tree is a spanning tree that is rooted at a spine node that 
defines the shortest path tree from the spine node to each end node. 

[0037] In network 200, there is a routing tree for each of second stage InfiniBand INFINIBAND 
switches 218. In an embodiment, routing tree 230 includes InfiniBand INFINIBAND switch 202, 
which is a spine node, and associated links to each of first stage InfiniBand INFINIBAND 
switches 216 and associated inter- switch links through each of first stage InfiniBand 
INFINIBAND switches 216 to each of plurality of end node ports 252, 254, 256, 258. 

[0038] In an embodiment, routing tree 232 includes InfiniBand INFINIBAND switch 204, which 
is a spine node, and associated links 225 to each of first stage InfiniBand INFINIBAND switches 
216 and associated inter-switch links through each of first stage InfiniBand INFINIBAND 
switches 216 to each of plurality of end node ports 252, 254, 256, 258. 

[0039] In an embodiment, routing tree 234 includes InfiniBand INFINIBAND switch 206, which 
is a spine node, and associated links to each of first stage InfiniBand INFINIBAND switches 216 
and associated inter-switch links through each of first stage InfiniBand INFINIBAND switches 
216 to each of plurality of end node ports 252, 254, 256, 258. 

[0040] In an embodiment, routing tree 236 includes InfiniBand INFINIBAND switch 208, which 
is a spine node, and associated links to each of first stage InfiniBand INFINIBAND switches 216 
and associated inter-switch links through each of first stage InfiniBand INFINIBAND switches 
216 to each of plurality of end node ports 252, 254, 256, 258. 

[0041] In an illustration of an embodiment, a packet created at an end node coupled to 
InfiniBand INFINIBAND switch 210 can traverse a path 225. Packet can enter InfiniBand 
INFINIBAND switch 210 via end node port 221, traverse inter-switch link 229, continue on a 
link to InfiniBand INFINIBAND switch 202, traverse inter-switch link 227; travel to InfiniBand 
INFINIBAND switch 211, traverse inter-switch link 231, out end node port 223 to another end 
node. In this embodiment, the packet travels path 225 between an end node coupled to 



InfiniBand INFINIBAND switch 210 and an end node coupled to InfiniBand INFINIBAND 
switch 21 1. In an embodiment; path 225 is a shortest path 225 between spine node 202 and each 
of plurality of end nodes. In this embodiment, the packet traveled from a source to a destination 
using routing tree 230. As is known in the art, each of destinations in network 200 operating 
using InfiniBand INFINIBAND has a Base Local Identifier, known as a BaseLID 237, which is 
analogous to an address of the destination. 

[0042] In this embodiment, any packet created at a source needs a BaseLID of the destination 
and a routing tree to define the path to define a unique path from the source to the destination. In 
an embodiment, the sum of the BaseLID and the routing tree (which can be, for example, a 
routing tree ID) can be a Destination Local Identifier (DLID). DLID includes the destination port 
(as designated by BaseLID) and the path to get there from the source, where the path is identified 
by, for example and without limitation, a routing tree ID. 

[0043] In an embodiment, network 200, can be a €4es CLOS network 220, and also a 
rearrangably non-blocking €4es CLOS network since the number of second stage InfiniBand 
INFINIBAND switches 2 1 8 is greater than or equal to the number of end node ports on a first 
stage InfiniBand INFINIBAND switch 216. In another embodiment, network 200 can be a 
strictly non-blocking €4es CLOS network since the number of second stage InfiniBand 
INFINIBAND switches 218 equal to or greater than 2*(number of end node ports on a first stage 
InfiniBand INFINIBAND switch 216)-1. In an embodiment, traffic in network 200 can be 
scheduled such that the only queuing delays experienced by an admissible traffic pattern are 
attributable to the multiplexing of packets from slow links onto a faster link whose aggregate 
bandwidth at least equals the sum of the bandwidths of the smaller links. In this embodiment, 
competing traffic sources do not attempt to use the same network resources at the same time. As 
defined above, network 200 can then be a SNIN 219. 

[0044] FIG. 3 depicts a network 300 according to yet another embodiment of the invention. As 
shown in FIG. 3, network 300 includes first stage InfiniBand INFINIBAND switches 350 
coupled to second stage InfiniBand INFINIBAND switches 318 via plurality of links. In an 
embodiment, each of plurality of first stage InfiniBand INFINIBAND switches 350 can be 



coupled to one or more of plurality of end nodes (not shown for clarity), via plurality of end node 
ports. For example, InfiniBand INFINIBAND switch 310 can comprise plurality of end node 
ports 352, InfiniBand INFINIBAND switch 311 can comprise plurality of end node ports 354, 
InfiniBand INFINIBAND switch 312 can comprise plurality of end node ports 356, and 
InfiniBand INFINIBAND switch 313 can comprise plurality of end node ports 358. 

[0045] As is known in the art, a dilated network is one in which the total bandwidth between at 
least one pair of switches is greater than the bandwidth of a link connecting a switch to an end 
node. In an embodiment, network 300 can be a dilated network as there are two links between 
each of first stage InfiniBand INFINIBAND switches 350 and second stage InfiniBand 
INFINIBAND switches 318. Dilated networks are significant because they allow the cost- 
effective construction of non-blocking networks. Dilated network are also significant when links 
of differing speeds are used in the network. 

[0046] In an embodiment, network 300 is equivalent to network 200, where network 300 is 
dilated. Therefore, network 300 is also a Gies CLOS network 320. Network 300 is more cost- 
effective as only two second stage InfiniBand INFINIBAND switches 24& 318 are required. As 
is known the art of networking, equivalence can be shown between network 300 and network 
200. Equivalence allows a path in network 300 to be mapped back to a path in network 200, such 
that non-interfering traffic flows remain non-interfering. Any admissible set of connections can 
be carried by either of network 200 or network 300. Therefore, a dilated network such as 
network 300 can carry any set of connections that network 200 can. Therefore, network 300 can 
be rearrangably non-blocking €4es CLOS network, a strictly non-blocking €4es CLOS network 
and/or a SNIN 3 19 as was shown with reference to network 200. 

[0047] In the embodiment, depicted in FIG. 3, second stage InfiniBand INFINIBAND switches 
318 include InfiniBand INFINIBAND switch switches 302, 304. In network 300, particularly in 
€4os CLOS network 320, the stage of InfiniBand INFINIBAND switches furthest from plurality 
of end nodes are referred to as spine nodes. In the embodiment depicted in FIG. 3, second stage 
InfiniBand INFINIBAND switches 318 can be considered spine nodes. Therefore, in this 
embodiment, each InfiniBand switch of INFINIBAND switches 302, 304 is a spine node. In 



network 300, there may be multiple shortest paths between a spine node and an end node. A 
generalization can be made from the non-dilated case shown in FIG. 2 by defining a routing tree 
in such a way that is sufficient to cover all the paths for a routing tree between a spine node and 
the plurality of end nodes. 

[0048] In network 300, there are two routing trees for each of second stage InfiniBand 
INFINIBAND switches 318. In an embodiment, routing tree 330 includes InfiniBand 
INFINIBAND switch 302, which is a spine node, and associated links to each of first stage 
InfiniBand INFINIBAND switches 350 and associated inter-switch links through each of first 
stage InfiniBand INFINIBAND switches 350 to each of plurality of end node-ports 352, 354, 
356,358. 

[0049] In an embodiment, routing tree 332 includes InfiniBand INFINIBAND switch 302, which 
is a spine node, and associated links to each of first stage InfiniBand INFINIBAND switches 350 
and associated inter-switch links through each of first stage InfiniBand INFINIBAND switches 
350 to each of plurality of end node ports 352, 354, 356, 358. 

[0050] In an embodiment, routing tree 334 includes v switch 304, which is a spine node, and 
associated links to each of first stage InfiniBand INFINIBAND switches 350 and associated 
inter-switch links through each of first stage InfiniBand INFINIBAND switches 350 to each of 
plurality of end node ports 352, 354, 356, 358. 

[0051] In an embodiment, routing tree 336 includes InfiniBand INFINIBAND switch 304, which 
is a spine node, and associated links to each of first stage InfiniBand INFINIBAND switches 350 
and associated inter-switch links through each of first stage InfiniBand INFINIBAND switches 
350 to each of plurality of end node ports 352, 354, 356, 358. 

[0052] In an illustration of an embodiment, a packet created at end node coupled to InfiniBand 
INFINIBAND switch 312 can traverse a path to an end node coupled to InfiniBand 
INFINIBAND switch 314. The packet Packet can enter InfiniBand INFINIBAND switch 3 12 via 
end node port 321, traverse inter-switch link 329, continue on a link (using routing tree 334) to 



InfiniBand INFINIBAND switch 304, traverse inter-switch link 327, travel to InfiniBand 
INFINIBAND switch 314, traverse inter-switch link 331, out end node port 323 to another end 
node. In this embodiment, the packet travels the path between an end node coupled to InfiniBand 
INFINIBAND switch 312 and an end node coupled to InfiniBand INFINIBAND switch 314. In 
an embodiment, the path is a shortest path between spine node 304 and each of plurality of end 
nodes. In this embodiment, the packet traveled from a source to a destination using routing tree 
334. As is known in the art, each of destinations in network 300 operating using InfiniBand 
INFINIBAND has a BaseLID. The sum of the BaseLID and the routing tree (which can be, for 
example, a routing tree ID) can be a DLID analogous to that described above with reference to 
FIG. 2. 

[0053] FIG. 4 depicts a block diagram of a network 400 according to an embodiment of the 
invention. Network 400 includes a path determination mechanism that programs forwarding 
tables of InfiniBand INFINIBAND switches with paths appropriate to make network 400 operate 
as a SNIN 419. As shown in FIG. 4, network 400 can include one or more end nodes 406, which 
are representative of plurality of end nodes 114 shown in FIG. 1 and referred to in FIG. 2 and 
FIG. 3. End node 406 can be coupled to a connection controller 402, which is in turn coupled to 
master subnet manager 404. Master subnet manager 404 is also coupled to each of one or more 
InfiniBand INFINIBAND switches 401, which represents any of InfiniBand INFINIBAND 
switches referred to in FIGS. 1-3. 

[0054] Network 400 , when operating using InfiniBand INFINIBAND, has one master subnet 
manager 404, which can reside on a port, InfiniBand INFINIBAND switch, router, end node, and 
the like. In another embodiment, master subnet manager 404 can be distributed among any 
number of InfiniBand INFINIBAND switches, end nodes and ports. Master subnet manager 404 
can be implemented in hardware or software. When there are multiple subnet managers in 
network 400, one subnet manager will include master subnet manager 404 and any other subnet 
managers within network 400 may become a standby subnet manager. 

[0055] In an embodiment, master subnet manager 404 manages network 400 and can initialize 
and configure network 400. This can include discovering a topology of network 400, establishing 



possible paths among InfiniBand INFINIBAND switches and end nodes, assigning local 
identifiers to each port in network 400, sweeping the network and discovering and managing 
changes in topology of network 400, and the like. In the realm of InfiniBand INFINIBAND , 
network 400 can be considered a subnet. 

[0056] In an embodiment, master subnet manager 404 can include network topology data 405, 
which contains data on network 400 and all paths, InfiniBand INFINIBAND switches, end 
nodes, links, and the like. Master subnet manager 404 can also include an SNIN policy entity, 
which can be a mechanism to specify whether the policy of operating network 400 as an SNFN is 
in effect. 

[0057] In an embodiment, connection controller 402 can be a software entity responsible for 
receiving a requested traffic pattern 403 from one or more end nodes 406, routing connections in 
network 400 in a non-interfering fashion and conveying routing information to respective end 
nodes. In other words, connection controller 402 can receive connection requests from end nodes 
and amalgamate them to form requested traffic pattern 403. In an embodiment, connection 
controller 402 can also communicate with master subnet manager 404 to pre-program end nodes 
in a way that is consistent with non-interfering operation of network 400. In an embodiment, 
connection controller 402 can reside on a port, InfiniBand INFINIBAND switch, router, end 
node, and the like. In another embodiment, connection controller 402 can be distributed among 
any number of InfiniBand INFINIBAND switches, end nodes and ports. 

[0058] In an embodiment, connection controller 402 can include network topology cache 418, 
which maintains a local representation network topology data 405. In other words, network 
topology cache 418 can maintain a local representation of master subnet manager's 404 view of 
network topology data 405, including paths established between InfiniBand INFINIBAND 
switches, end nodes, and the like, of network 400. Connection controller 402 can also include 
logical traffic pattern cache 416, which is responsible for storing requested traffic pattern 403 
received from one or more of end nodes 406. 



[0059] Connection controller 402 can also include packing algorithm 414, which can combine 



requested traffic pattern 403 with network topology data 405 stored in network topology cache 
418 to calculate actual traffic pattern 412. In an embodiment, actual traffic pattern 412 can 
include the set of paths that each packet in requested traffic pattern is to use in order to achieve 
non-interfering operation of network 400. Logical network state entity 420 stores actual traffic 
pattern 412 from packing algorithm 414 and communicates actual traffic pattern 412 to sources 
at each end node 406 included in requested traffic pattern 403. 

[0060] Packing algorithm 414 can include rearrangement algorithm 409. In an embodiment, 
rearrangement algorithm 409 can identify how to rearrange a network so as to allow the 
admission of a new admissible connection in a non-interfering fashion. In an embodiment, the 
input to rearrangement algorithm can be a Paull PAULL matrix representing a non-interfering 
network state and a request to establish a new connection. The output of rearrangement 
algorithm can be a new Paull PAULL matrix representing a non-interfering network state in 
which the new connection is carried in addition to the pre-existent connections. An example of 
an embodiment of rearrangement algorithm 409 is Hui's HULs rearrangement algorithm. It is 
desired to be understood that Hui's HUI's rearrangement algorithm is merely exemplary and that 
other rearrangement algorithms are included in the scope of the invention. 

[0061] In some networks, such as Folded Networks, rearrangement algorithm 409 can find a path 
for an admissible traffic pattern. However, the resulting path may have loops in it. After 
determining the path for all connections, but prior to having instantiated the paths, each path can 
be independently pruned to remove any loops. 

[0062] In a €4es CLOS network, the tuple [[of]] (source, destination, spine node) uniquely 
identifies every path that could potentially be selected as a consequence of rearrangement 
algorithm 409. The tuple (source, destination, spine node) defines the path obtained by applying 
loop removal to the path obtained by taking the shortest path from source to spine node followed 
by shortest path from spine node to destination. As described above, routing tree is a shortest- 
path spanning tree rooted at one of the spine nodes. The tuple (source, destination, routing tree) 
identifies a loop-less shortest path from source to destination contained entirely within the 
routing tree. This identification of a path is unique in network 400. The identification of a 



minimally sufficient set of routing trees to support rearrangement algorithm 409 allows 
programming of InfiniBand INFINIBAND switch forwarding tables and enables the realization 
of network 400 as a SNIN 419. This is discussed further below. 

[0063] Network 400 can include end node 406. End node 406 is representative of plurality of end 
nodes 114 shown in FIG. 1 and referred to in FIG. 2 and FIG. 3. End node 406 can include 
process 426, which can be a user process that wishes to connect with network 400, in particular 
SNIN 419. Process 426 can be a program, job, and the like, contained in memory on end node 
406 and controlled by a processor (not shown) on end node 406. End node 406^ when operating 
using InfiniBand INFINIBAND, can include queue pair 424, which represents one half (either 
receive or transmit) of an InfiniBand INFINIBAND communications process. Queue pair 424 is 
known in the art. End node 406 can include QP mesh manager, which can be a software entity 
responsible for maintaining multiple queue pairs existent on end node 406, communicating with 
logical network state entity 420 to receive actual traffic pattern 412 pertaining to packet 408 
created at end node 406, and informing end node (as a source) which queue pair to use at any 
given instant in time. 

[0064] Network 400 can include InfiniBand INFINIBAND switch 401, which represents any of 
InfiniBand INFINIBAND switches referred to in FIGS. 1-3. InfiniBand INFINIBAND switch 
401 can include forwarding table 415 to store, in one embodiment, set of forwarding instructions 
413 and plurality of DLIDs 410. As discussed above, DLID comprises a BaseLID and reference 
to a routing tree (routing tree ID). In an embodiment, a packet 408 with a DLID 421 in the 
packet header 411, created at end node 406 acting as a source, enters InfiniBand INFINIBAND 
switch 401. DLID 421 is looked up in forwarding table 415 to find corresponding one of 
plurality of DLIDs 410. Packet 408 is then forwarded toward a destination based on the set of 
forwarding instructions 413 corresponding to DLID 421. 

[0065] In an embodiment, when network 400 is initialized, or when network 400 has a topology 
change, forwarding table 415 of each InfiniBand INFINIBAND switch 401 can be populated 
with plurality of DLIDs 410 and set of forwarding instructions 413 such that network operates as 
a SNIN 419 if SNIN policy is in effect per SNIN policy entity 407. This can begin with 



connection controller 402 calculating a plurality of routing trees for the plurality of < InfiniBand 
INFINIBAND switches in network 400. Connection controller 402 can receive the topology of 
network 400 (network topology data 405) from master subnet manager 404 as described above. 
A plurality Plurality of routing trees can be calculated based on each spine node in a €4es CLOS 
network as described with reference to FIGS. 2 and 3. 

[0066] Thereafter, a plurality of DLIDs 410 and a set of forwarding instructions 413 for each 
InfiniBand INFINIBAND switch 401 can be calculated where each of the plurality of DLIDs 410 
corresponds to one of the routing trees of which InfiniBand INFINIBAND switch 401 is part and 
one of a plurality of destinations as referenced by a BaseLID. In an embodiment, calculating the 
plurality of routing trees includes, for each spine node, calculating a shortest path from the spine 
node to each of a plurality of sources and a plurality of destinations. The plurality of routing trees 
include at least a portion of the plurality of InfiniBand INFINIBAND switches in network 400 
and the corresponding plurality of links that form a shortest path from at least one of the plurality 
of sources or one of the plurality of destinations to the spine node of network 400. The addition 
of a routing tree (routing tree ID) to a BaseLID produces a DLID for a given destination. 
Forwarding table 415 will only use the links associated with the routing tree for that particular 
DLID. 

[0067] In an embodiment, forwarding table 415 can be populated as each DLID and set of 
forwarding instructions is calculated. In another embodiment, each DLID and set of forwarding 
instructions can be sent to InfiniBand INFINIBAND switch 401 after the plurality of DLIDs 410 
and set of forwarding instructions 413 are calculated for each of plurality of InfiniBand 
INFINIBAND switches in network 400. 

[0068] Once forwarding table 415 is populated at each of InfiniBand INFINIBAND switches 
401 in network 400, connection controller 402 and master subnet manager 404 can be coupled to 
operate network 400 as a SNIN 419. Packet 408 can be created at one of a plurality of sources, 
where the one of the plurality of sources can be located at end node 406. Packet 408 has a 
destination as defined by a BaseLID of a destination in network 400. In a given time window, 
each source can submit to connection controller 402 the destination where it wants to send a 



packet. The sum of all of these requests by a plurality of sources can be requested traffic pattern 
403. Connection controller 402, in particular packing algorithm 414, runs rearrangement 
algorithm 409 for network 400 and computes actual traffic pattern 412 using requested traffic 
pattern 403 and network topology data 405, such that network 400 operates as a SNIN 419. 
Connection controller 402 then has logical network state entity 420 communicate actual traffic 
pattern 412 to the source at end node 406 corresponding to packet 408. Actual traffic pattern 412 
can comprise a DLID 421 assigned to packet 408 such that network 400 operates as a SNIN 419. 
QP mesh manager 422 at end node 406 can then assign a specific queue pair corresponding to 
the DLID 421. 

[0069] In the given time window, once connection controller 402 has assigned DLIDs to all of 
the packets corresponding to the requested traffic pattern 403, packet 408 follows a path through 
at least a portion of plurality of InfiniBand INFINIBAND switches 401 toward its destination. 
Time window, can be for example and without limitation, 1/60* of a second. Each of the portion 
of the plurality of InfiniBand INFINIBAND switches forwards the packet 408 according to the 
DLID 421 assigned to the packet 408. When packet 408 arrives at InfiniBand INFINIBAND 
switch 401, the DLID 421 in packet header 41 1 is looked up in forwarding table 415. DLID 421 
is matched with one of the plurality of DLIDs 410 in forwarding table 415 and packet 408 is 
forwarded out of a port on InfiniBand INFINIBAND switch 401 to another InfiniBand 
INFINIBAND switch according to set of forwarding instructions 413 corresponding to the one of 
the plurality of DLIDs 410 matching the DLID 421 in packet header 411. The packet will follow 
only the links designated in the routing tree corresponding to the DLID 421 assigned to the 
packet. This is repeated at each [[of]] portion of the plurality of InfiniBand INFINIBAND 
switches until packet 408 reaches its destination end node. The process can be repeated for each 
subsequent time window as long as network 400 is in operation. In another embodiment, each 
source can tell connection controller 402 that it wants to operate during a given time frame. In 
this embodiment, this data can be requested traffic pattern 403 and connection controller 402 can 
compute actual traffic pattern so that network 400 operates as SNIN 419. 

[0070] The above process of populating forwarding tables of InfiniBand INFINIBAND switches 
with paths appropriate to make network 400 operate as a SNIN 419 works particularly well for a 



€4es CLOS network. However, as a €4es CLOS network is instantiated, it is unlikely that all 
InfiniBand INFINIBAND switches will be turned es "ON" simultaneously. As such network 
400 can pass through states in which it is not a Gles CLOS network. Therefore the above 
methodology can be implemented in a non Clos non-CLOS network as well, where the 
populating of forwarding tables occurs after each change in topology of network 400. 

[0071] FIG. 5 illustrates a flow diagram 500 of a method of the invention according to an 
embodiment of the invention. In step 502, a plurality of routing trees are calculated for a plurality 
of InfiniBand INFINIBAND switches in a network. In an embodiment, calculating the plurality 
of routing trees comprises for each spine node in the network, calculating a shortest path from 
the spine node to each of the plurality of sources and each of the plurality of destinations. In an 
embodiment, the network is a €4es CLOS network. Each of the plurality of routing trees can 
comprise at least a portion of the plurality of InfiniBand INFINIBAND switches and 
corresponding plurality of links that form a shortest path from one of the plurality of sources or 
one of the plurality of destinations to a spine node of the €4es CLOS network. 

[0072] In step 504, a plurality of DLIDs and a set of forwarding instructions are calculated for 
each of the plurality of InfiniBand INFINIBAND switches, wherein each of the plurality of 
DLIDs corresponds to one of the plurality of routing trees and one of a plurality of destinations. 
In step 506, a forwarding table of each of the plurality of InfiniBand INFINIBAND switches in 
the G4es CLOS network is populated with the plurality of DLIDs and the set of forwarding 
instructions. 

[0073] FIG. 6 illustrates a flow diagram 600 of a method of the invention according to another 
embodiment of the invention. In an embodiment, the method illustrated in FIG. 6 illustrates one 
embodiment for calculating a plurality of routing trees from a plurality of spanning trees and 
programming and populating forwarding tables at a plurality of InfiniBand INFINIBAND 
switches with DLIDs and corresponding sets of forwarding instructions such that a network can 
operate as a SNIN. The method is particularly suited to, but not limited to, rearrangably, non- 
blocking, multistage Gies CLOS networks. 



[0074] In step 602, a plurality of end nodes, InfiniBand INFINIBAND switches and links define 
a plurality of spanning trees. In step 604, one of the plurality of spanning trees is selected as the 
current spanning tree. In step 606, one of the plurality of end nodes is selected as the current end 
node. In step 608, one of the plurality of InfiniBand INFINIBAND (IB in FIGS. 6 and 7 ) 
switches is selected as the current InfiniBand INFINIBAND switch. 

[0075] In step 610, the current DLID is calculated to be the BaseLID of the current end node 
plus the tree ID of the current spanning tree. In step 612, the current outgoing port is set equal to 
the outgoing port from the current InfiniBand INFINIBAND switch which moves a packet closer 
to the current end node, given that only links in the current spanning tree can be used. Step 614 
represents one embodiment of the invention that includes populating the current InfiniBand 
INFINIBAND switch's forwarding tables such that the current InfiniBand INFINIBAND switch 
forwards packets with the DLID equaling the current DLID, on an outgoing port equal to the 
current outgoing port. In another embodiment, of the invention, step 614 is not included and an 
additional step at the end of the flow diagram in FIG. 6 is included to populate the forwarding 
tables with a plurality of DLIDs and a-set of forwarding instructions. In other words, in this 
alternate embodiment, the forwarding tables are populated only after the plurality of routing 
trees, plurality of DLIDs and set of forwarding instructions are all calculated. 

[0076] In step 616, it is determined if the current InfiniBand INFINIBAND switch is the last of 
the plurality of InfiniBand INFINIBAND switches. If not, the current InfiniBand INFINIBAND 
switch is set equal to the next of the plurality of InfiniBand INFINIBAND switches per step 618 
and the process returns to step 610. This process repeats until, in step 616, the current InfiniBand 
INFINIBAND switch is the last of the plurality of InfiniBand INFINIBAND switches, at which 
time the process moves to step 620. In other words, for a given spanning tree and a given end 
node, each InfiniBand INFINIBAND switch in the network is processed per steps 610-614. 

[0077] In step 620, it is determined if the current end node is the last of the plurality of end 
nodes. If not, the current end node is set equal to the next of the plurality of end nodes per step 
622 and the process returns to step 608. This process repeats until, in step 620, the current end 



node is the last of the plurality of end nodes, at which time the process moves to step 620. In 
other words, for a given spanning tree, each end node in the network is processed per steps 610- 
614. 

[0078] In step 624, it is determined if the current spanning tree is the last of the plurality of 
spanning trees. If not, the current spanning tree is set equal to the next of the plurality of 
spanning trees per step 626 and the process returns to step 606. This process repeats until, in step 
624, the current spanning tree is the last of the plurality of spanning trees, at which time the 
process of FIG. 6 is completed. At the completion of the process of FIG. 6, the forwarding tables 
of each of the plurality of InfiniBand INFINIBAND switches is populated with a plurality of 
DLIDs and the set of forwarding instructions such that a packet arriving at an InfiniBand 
INFFNIBAND switch can be forwarded such that the network operates as a SNFN. 

[0079] FIG. 7 illustrates a flow diagram of a method of the invention according to yet another 
embodiment of the invention. In step 702, a packet is created at a source in a network, wherein 
the packet is addressed to a destination. Step 704 includes executing a rearrangement algorithm 
for the network. Step 706 includes assigning one of a plurality of DLIDs to the packet. Step 708 
includes the packet following a path through at least a portion of a plurality of InfiniBand 
INFINIBAND switches from the one of the plurality of sources to the one of the plurality of 
destinations, wherein each of the portion of the plurality of InfiniBand FNFINIBAND switches 
forward the packet according to the one of the plurality of DLIDs assigned to the packet. Step 
708 includes looking up the one of the plurality of DLIDs assigned to the packet in the 
forwarding table at each of the portion of the plurality of InfiniBand INFFNIBAND switches 
along the path from the source to the destination. In other words, each of the portion of the 
plurality of InfiniBand INFINIBAND switches forwards the packet in accordance with the one 
of the plurality of DLIDs assigned to the packet as found in the forwarding table at each the 
portion of the plurality of InfiniBand INFINIBAND switches. 

[0080] While we have shown and described specific embodiments of the present invention, 
further modifications and improvements will occur to those skilled in the art. It is therefore, to be 



understood that appended claims are intended to cover all such modifications and changes as fall 
within the true spirit and scope of the invention. 



